We will do a clustering analysis on the iris data set, using the k-means technique and a hierarchical clustering approach
The iris data set is one of R’s built-in data-sets, and is also available at the UCI Machine Learning Repository.
The data set was created by statistician Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
The data consists of 3 classes of flower types:
setosa
virginica
versicolor
The data has 4 attributes:
sepal width
sepal length
petal width
petal length
This famous (Fisher’s or Anderson’s) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
The built-in data set iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

library(tidyverse)
Let us first create a plot using the variables Petal.Width and Petal.Width, coloring every point by their class:
ggplot(iris, aes(x = Petal.Width, y = Petal.Length, color = Species)) +
geom_point()

Using k-means
Let us try to find the 3 groups using k-means
# use kmeans() with the subset of columns we want to consider
iris_mini <- iris[, c("Petal.Width", "Petal.Length")]
km_iris <- kmeans(x = iris_mini, centers = 3)
The object km_iris contains an attribute called cluster that we want to extract to see the different assignments:
str(km_iris)
List of 9
$ cluster : int [1:150] 3 3 3 3 3 3 3 3 3 3 ...
$ centers : num [1:3, 1:2] 1.342 2.037 0.246 4.269 5.596 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:3] "1" "2" "3"
.. ..$ : chr [1:2] "Petal.Width" "Petal.Length"
$ totss : num 551
$ withinss : num [1:3] 13.06 16.29 2.02
$ tot.withinss: num 31.4
$ betweenss : num 520
$ size : int [1:3] 52 48 50
$ iter : int 3
$ ifault : int 0
- attr(*, "class")= chr "kmeans"
Let us plot the points again, now coloring them based on the clusters found by the k-means algorithm:
- First convert the assigment to a factor variable:
- Now plot the points colored by their clusters from the k-means procedure

Using hierarchical clustering
We will perform the same task now using hiercarchical clustering. For this we use the function hclust()
hc_iris
Call:
hclust(d = dis, method = "single")
Cluster method : single
Distance : euclidean
Number of objects: 150
Explore the structure of the hc_iris object:
str(hc_iris)
List of 7
$ merge : int [1:149, 1:2] -1 -5 -9 -29 -34 -48 -50 -3 -39 -43 ...
$ height : num [1:149] 0 0 0 0 0 0 0 0 0 0 ...
$ order : int [1:150] 45 25 23 14 44 15 36 24 6 27 ...
$ labels : NULL
$ method : chr "single"
$ call : language hclust(d = dis, method = "single")
$ dist.method: chr "euclidean"
- attr(*, "class")= chr "hclust"
To find the cluster assignments in this case, let us cut the dendrogram tree to the proper number of clusters using the cutree() (in our case 3)
And now recreate the plot with the assignment provided by the hierarchical clustering technique

- Maybe if we try a different linkage we can obtain better results:
# hierarchical clustering using complete linkage
dis2 <- dist(iris_mini)
# use distance matrix as input for clustering method
hc_iris2 <- hclust(dis2, method = "complete")
- Get new assignments from the complete linkage option
# use cutree() to cut the dendrogram
newcut3 <- cutree(hc_iris2, k = 3)
# save the assignment as a factor
hc_results2 <- iris_mini %>%
mutate(cluster = factor(newcut3))
- And now recreate the plot with the assignment provided by the hierarchical clustering technique

LS0tCnRpdGxlOiAiSy1tZWFucyBhbmQgSGllcmFyY2hpY2FsIENsdXN0ZXJpbmcgb24gdGhlIGBpcmlzYCBEYXRhc2V0IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpXZSB3aWxsIGRvIGEgY2x1c3RlcmluZyBhbmFseXNpcyBvbiB0aGUgYGlyaXNgIGRhdGEgc2V0LCB1c2luZyB0aGUgay1tZWFucyB0ZWNobmlxdWUgYW5kIGEgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgYXBwcm9hY2ggCgpUaGUgaXJpcyBkYXRhIHNldCBpcyBvbmUgb2YgUidzIGJ1aWx0LWluIGRhdGEtc2V0cywgYW5kIGlzIGFsc28gYXZhaWxhYmxlIGF0IHRoZSBbVUNJIE1hY2hpbmUgTGVhcm5pbmcgUmVwb3NpdG9yeV0oaHR0cDovL2FyY2hpdmUuaWNzLnVjaS5lZHUvbWwvKS4KClRoZSBkYXRhIHNldCB3YXMgY3JlYXRlZCBieSBzdGF0aXN0aWNpYW4gUm9uYWxkIEZpc2hlciBpbiBoaXMgMTkzNiBwYXBlciBUaGUgdXNlIG9mIG11bHRpcGxlIG1lYXN1cmVtZW50cyBpbiB0YXhvbm9taWMgcHJvYmxlbXMgYXMgYW4gZXhhbXBsZSBvZiBsaW5lYXIgZGlzY3JpbWluYW50IGFuYWx5c2lzLiAgCgpUaGUgZGF0YSBjb25zaXN0cyBvZiAqKjMgY2xhc3NlcyoqIG9mIGZsb3dlciB0eXBlczoKCiAgKiBzZXRvc2EKICAKICAqIHZpcmdpbmljYQogIAogICogdmVyc2ljb2xvcgogIApUaGUgZGF0YSBoYXMgKio0IGF0dHJpYnV0ZXMqKjoKICAKICAqIHNlcGFsIHdpZHRoCiAgCiAgKiBzZXBhbCBsZW5ndGgKICAKICAqIHBldGFsIHdpZHRoCiAgCiAgKiBwZXRhbCBsZW5ndGgKCgpUaGlzIGZhbW91cyAoRmlzaGVyJ3Mgb3IgQW5kZXJzb24ncykgaXJpcyBkYXRhIHNldCBnaXZlcyB0aGUgbWVhc3VyZW1lbnRzIGluIGNlbnRpbWV0ZXJzIG9mIHRoZSB2YXJpYWJsZXMgc2VwYWwgbGVuZ3RoIGFuZCB3aWR0aCBhbmQgcGV0YWwgbGVuZ3RoIGFuZCB3aWR0aCwgcmVzcGVjdGl2ZWx5LCBmb3IgNTAgZmxvd2VycyBmcm9tIGVhY2ggb2YgMyBzcGVjaWVzIG9mIGlyaXMuIFRoZSBzcGVjaWVzIGFyZSBJcmlzIHNldG9zYSwgdmVyc2ljb2xvciwgYW5kIHZpcmdpbmljYS4KClRoZSBidWlsdC1pbiBkYXRhIHNldCBgaXJpc2AgaXMgYSBkYXRhIGZyYW1lIHdpdGggMTUwIGNhc2VzIChyb3dzKSBhbmQgNSB2YXJpYWJsZXMgKGNvbHVtbnMpIG5hbWVkIGBTZXBhbC5MZW5ndGhgLCBgU2VwYWwuV2lkdGhgLCBgUGV0YWwuTGVuZ3RoYCwgYFBldGFsLldpZHRoYCwgYW5kIGBTcGVjaWVzYC4KCiFbXShodHRwczovL3MzLmFtYXpvbmF3cy5jb20vYXNzZXRzLmRhdGFjYW1wLmNvbS9ibG9nX2Fzc2V0cy9pcmlzLW1hY2hpbmVsZWFybmluZy5wbmcpCgoKYGBge3J9CmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCgpMZXQgdXMgZmlyc3QgY3JlYXRlIGEgcGxvdCB1c2luZyB0aGUgdmFyaWFibGVzIGBQZXRhbC5XaWR0aGAgYW5kIGBQZXRhbC5XaWR0aGAsIGNvbG9yaW5nIGV2ZXJ5IHBvaW50IGJ5IHRoZWlyIGNsYXNzOgoKYGBge3J9CmdncGxvdChpcmlzLCBhZXMoeCA9IFBldGFsLldpZHRoLCB5ID0gUGV0YWwuTGVuZ3RoLCBjb2xvciA9IFNwZWNpZXMpKSArCiAgZ2VvbV9wb2ludCgpCmBgYAoKCiMjIFVzaW5nIGstbWVhbnMKCkxldCB1cyB0cnkgdG8gZmluZCB0aGUgMyBncm91cHMgdXNpbmcgay1tZWFucyAKCmBgYHtyfQojIHVzZSBrbWVhbnMoKSB3aXRoIHRoZSBzdWJzZXQgb2YgY29sdW1ucyB3ZSB3YW50IHRvIGNvbnNpZGVyCmlyaXNfbWluaSA8LSBpcmlzWywgYygiUGV0YWwuV2lkdGgiLCAiUGV0YWwuTGVuZ3RoIildCmttX2lyaXMgPC0ga21lYW5zKHggPSBpcmlzX21pbmksIGNlbnRlcnMgPSAzKQpgYGAKClRoZSBvYmplY3QgYGttX2lyaXNgIGNvbnRhaW5zIGFuIGF0dHJpYnV0ZSBjYWxsZWQgYGNsdXN0ZXJgIHRoYXQgd2Ugd2FudCB0byBleHRyYWN0IHRvIHNlZSB0aGUgZGlmZmVyZW50IGFzc2lnbm1lbnRzOiAKCmBgYHtyfQpzdHIoa21faXJpcykKYGBgCgpMZXQgdXMgcGxvdCB0aGUgcG9pbnRzIGFnYWluLCBub3cgY29sb3JpbmcgdGhlbSBiYXNlZCBvbiB0aGUgY2x1c3RlcnMgZm91bmQgYnkgdGhlIGstbWVhbnMgYWxnb3JpdGhtOgoKLSBGaXJzdCBjb252ZXJ0IHRoZSBhc3NpZ21lbnQgdG8gYSBfZmFjdG9yXyB2YXJpYWJsZToKCmBgYHtyfQprbV9yZXN1bHRzIDwtIGlyaXNfbWluaSAlPiUKICAgICAgICAgICAgICAgIG11dGF0ZShjbHVzdGVyID0gZmFjdG9yKGttX2lyaXMkY2x1c3RlcikpCmttX3Jlc3VsdHMKYGBgCgotIE5vdyBwbG90IHRoZSBwb2ludHMgY29sb3JlZCBieSB0aGVpciBjbHVzdGVycyBmcm9tIHRoZSBrLW1lYW5zIHByb2NlZHVyZQoKYGBge3J9CiMgay1tZWFucyByZXN1bHRzCmttX3Jlc3VsdHMgJT4lCiAgZ2dwbG90KCkgKwogIGdlb21fcG9pbnQoYWVzKHggPSBQZXRhbC5XaWR0aCwgeSA9IFBldGFsLkxlbmd0aCwgY29sb3IgPSBjbHVzdGVyKSkKYGBgCgoKIyMgVXNpbmcgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcKCgpXZSB3aWxsIHBlcmZvcm0gdGhlIHNhbWUgdGFzayBub3cgdXNpbmcgaGllcmNhcmNoaWNhbCBjbHVzdGVyaW5nLgpGb3IgdGhpcyB3ZSB1c2UgdGhlIGZ1bmN0aW9uIGBoY2x1c3QoKWAgCgpgYGB7cn0KIyBoaWVyYXJjaGljYWwgY2x1c3RlcmluZyB3aXRoIHNpbmdsZSBsaW5rYWdlCmRpcyA8LSBkaXN0KGlyaXNfbWluaSkKIyB1c2UgZGlzdGFuY2UgbWF0cml4IGFzIGlucHV0IGZvciBjbHVzdGVyaW5nIG1ldGhvZApoY19pcmlzIDwtIGhjbHVzdChkaXMsIG1ldGhvZCA9ICJzaW5nbGUiKQpoY19pcmlzCmBgYAoKRXhwbG9yZSB0aGUgc3RydWN0dXJlIG9mIHRoZSBgaGNfaXJpc2Agb2JqZWN0OgoKYGBge3J9CnN0cihoY19pcmlzKQpgYGAKClRvIGZpbmQgdGhlIGNsdXN0ZXIgYXNzaWdubWVudHMgaW4gdGhpcyBjYXNlLCBsZXQgdXMgY3V0IHRoZSBkZW5kcm9ncmFtIHRyZWUgdG8gdGhlIHByb3BlciBudW1iZXIgb2YgY2x1c3RlcnMgdXNpbmcgdGhlIGBjdXRyZWUoKWAgKGluIG91ciBjYXNlIDMpCgpgYGB7cn0KIyB1c2UgY3V0cmVlKCkgdG8gY3V0IHRoZSBkZW5kcm9ncmFtCmN1dDMgPC0gY3V0cmVlKGhjX2lyaXMsIGsgPSAzKQojIHNhdmUgdGhlIGFzc2lnbm1lbnQgYXMgYSBmYWN0b3IKaGNfcmVzdWx0cyA8LSBpcmlzX21pbmkgJT4lCiAgICAgICAgICAgICAgICBtdXRhdGUoY2x1c3RlciA9IGZhY3RvcihjdXQzKSkKaGNfcmVzdWx0cwpgYGAKCkFuZCBub3cgcmVjcmVhdGUgdGhlIHBsb3Qgd2l0aCB0aGUgYXNzaWdubWVudCBwcm92aWRlZCBieSB0aGUgaGllcmFyY2hpY2FsIGNsdXN0ZXJpbmcgdGVjaG5pcXVlCgpgYGB7cn0KaGNfcmVzdWx0cyAlPiUKICBnZ3Bsb3QoKSArCiAgZ2VvbV9wb2ludChhZXMoeCA9IFBldGFsLldpZHRoLCB5ID0gUGV0YWwuTGVuZ3RoLCBjb2xvciA9IGNsdXN0ZXIpKQpgYGAKCi0gTWF5YmUgaWYgd2UgdHJ5IGEgZGlmZmVyZW50IF9saW5rYWdlXyB3ZSBjYW4gb2J0YWluIGJldHRlciByZXN1bHRzOiAKCgpgYGB7cn0KIyBoaWVyYXJjaGljYWwgY2x1c3RlcmluZyB1c2luZyBjb21wbGV0ZSBsaW5rYWdlCmRpczIgPC0gZGlzdChpcmlzX21pbmkpCiMgdXNlIGRpc3RhbmNlIG1hdHJpeCBhcyBpbnB1dCBmb3IgY2x1c3RlcmluZyBtZXRob2QKaGNfaXJpczIgPC0gaGNsdXN0KGRpczIsIG1ldGhvZCA9ICJjb21wbGV0ZSIpCmBgYAoKCi0gR2V0IG5ldyBhc3NpZ25tZW50cyBmcm9tIHRoZSBjb21wbGV0ZSBsaW5rYWdlIG9wdGlvbgoKYGBge3J9CiMgdXNlIGN1dHJlZSgpIHRvIGN1dCB0aGUgZGVuZHJvZ3JhbQpuZXdjdXQzIDwtIGN1dHJlZShoY19pcmlzMiwgayA9IDMpCiMgc2F2ZSB0aGUgYXNzaWdubWVudCBhcyBhIGZhY3RvcgpoY19yZXN1bHRzMiA8LSBpcmlzX21pbmkgJT4lCiAgICAgICAgICAgICAgICBtdXRhdGUoY2x1c3RlciA9IGZhY3RvcihuZXdjdXQzKSkKYGBgCgoKLSBBbmQgbm93IHJlY3JlYXRlIHRoZSBwbG90IHdpdGggdGhlIGFzc2lnbm1lbnQgcHJvdmlkZWQgYnkgdGhlIGhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIHRlY2huaXF1ZQoKYGBge3J9CiMgY3JlYXRlIG5ldyBwbG90CmhjX3Jlc3VsdHMyICU+JQogIGdncGxvdCgpICsKICBnZW9tX3BvaW50KGFlcyh4ID0gUGV0YWwuV2lkdGgsIHkgPSBQZXRhbC5MZW5ndGgsIGNvbG9yID0gY2x1c3RlcikpCmBgYAoKCg==